Domain adaptation via class-balanced self-training with spatial priors

ABSTRACT

A vehicle, system and method of navigating a vehicle. The vehicle and system include a digital camera for capturing a target image of a target domain of the vehicle, and a processor. The processor is configured to: determine a target segmentation loss for training the neural network to perform semantic segmentation of a target image in a target domain, determine a value of a pseudo-label of the target image by reducing the target segmentation loss while providing aa supervision of the training over the target domain, perform semantic segmentation on the target image using the trained neural network to segment the target image and classify an object in the target image, and navigate the vehicle based on the classified object in the target image.

CROSS REFERENCE TO RELATED APPLICATIONS

This application claims priority from U.S. Provisional Application Ser.No. 62/578,005, filed on Oct. 27, 2017, the contents of which areincorporated herein by reference in their entirety.

INTRODUCTION

The subject disclosure relates to a system and method for adaptingneural networks to perform semantic segmentation on images captured froma variety of domains, for autonomous driving and advanceddriver-assistance systems (ADAS).

In autonomous vehicles and ADAS, one goal is to understand thesurrounding environment such that information can be provided to eitherthe driver or the vehicle itself to make decisions accordingly. One wayto meet this goal is to capture digital images of the environment usingan on-board digital camera and then identify objects and drivable spacesin the digital image using computer vision algorithms. Suchidentification tasks can be achieved by semantic segmentation, wherepixels in the digital image are grouped and densely assigned with labelscorresponding to a predefined set of semantic classes (such as car,pedestrian, road, building, etc.). A neural network can be trained forsemantic segmentation using training images with human annotated labels.Often, due to the limitations on annotation resources, the trainingimages may only cover a small portion of the localities around theworld, may contain images under certain weathers and certain periods ina day, and may be collected by specific types of cameras. Theselimitations on the source of the training images are particular to thedomain of the training images. However, it is quite common that avehicle is operated at a different domain. Since different domains canhave different illumination, street styles, unseen objects, etc., aneural network trained in one domain does not always work well inanother domain. Accordingly, it is desirable to provide a method ofadapting a neural network trained for semantic segmentation in onedomain in order to operate the neural network effectively in anotherdomain.

SUMMARY

In one exemplary embodiment, a method of navigating a vehicle isdisclosed. The method includes determining a target segmentation lossfor training a neural network to perform semantic segmentation on atarget domain image, determining a value of a pseudo-label of the targetimage by reducing the target segmentation loss while providing asupervision of the training over the target domain, performing semanticsegmentation on the target image using the trained neural network tosegment the target image and classify an object in the target image, andnavigating the vehicle based on the classified objects in the targetimage.

The method further includes determining a source segmentation loss fortraining the neural network to perform semantic segmentation on a sourcedomain image, and reducing a summation of the source segmentation lossand the target segmentation loss while providing the supervision of thetraining over the target domain. The method can further include reducingthe summation by adjusting parameters of the neural network and thevalue of the pseudo-label.

In various embodiments, determining the value of the pseudo labels ofthe target image includes reducing the target segmentation loss over aplurality of segmentation classes while providing the supervision toeach of the plurality of segmentation classes. Determining the targetsegmentation loss further includes multiplying the spatial priordistribution for the segmentation class by a class probability of apixel being in the segmentation class. The neural network can be trainedusing adversarial domain adaptation training and/or a self-trainingdomain adaptation training. Supervision of the training can includeperforming class-balancing for the target segmentation loss. Asmoothness algorithm can be applied during the semantic segmentation ofthe target image.

In another exemplary embodiment, a navigation system for a vehicle isdisclosed. The system includes a digital camera for capturing a targetimage of a target domain of the vehicle, and a processor. The processoris configured to: determine a target segmentation loss for training theneural network to perform semantic segmentation of the target image inthe target domain, determine a value of a pseudo-label of the targetimage by reducing the target segmentation loss while providing asupervision of the training over the target domain, perform semanticsegmentation on the target image using the trained neural network tosegment the target image and classify objects in the target image, andnavigate the vehicle based on the classified object in the target image.

The processor is further configured to determine a source segmentationloss for training the neural network to perform semantic segmentation ona source domain image, and reduce a summation of the source segmentationloss and the target segmentation loss while providing the supervision ofthe training over the target domain. In one embodiment, the processor isfurther configured to reduce the summation by adjusting a parameter ofthe neural network and the value of the pseudo-label. The processor isfurther configured to determine the value of the pseudo-label of thetarget image by reducing the target segmentation loss over a pluralityof segmentation classes while providing the supervision to each of theplurality of segmentation classes. The processor is further configuredto multiply a spatial prior distribution for the segmentation class by aclass probability of a pixel being in the segmentation class.

In yet another exemplary embodiment, a vehicle is disclosed. The vehicleincludes a digital camera for capturing a target image of a targetdomain of the vehicle, and a processor. The processor is configured todetermine a target segmentation loss for training the neural network toperform semantic segmentation of the target image in the target domain,determine a value of a pseudo-label of the target image by reducing thetarget segmentation loss while providing a supervision of the trainingover the target domain, perform semantic segmentation on the targetimage using the trained neural network and the pseudo-label to segmentthe target image and classify an object in the target image, andnavigate the vehicle based on the classified object in the target image.

The processor is further configured to determine a source segmentationloss for training the neural network to perform semantic segmentation ona source domain image, and reducing a summation of the sourcesegmentation loss and the target segmentation loss while providing thesupervision of the training over the target domain.

In one embodiment, the processor is further configured to reduce thesummation by adjusting a parameter of the neural network and the valueof the pseudo-labels. The processor is further configured to determinethe value of the pseudo-labels of the target image by reducing thetarget segmentation loss over a plurality of segmentation classes whileproviding the supervision to each of the plurality of segmentationclasses. The processor is further configured to multiply a spatial priordistribution for a segmentation class by a class probability of a pixelbeing in the segmentation class to determine the target segmentationloss. The processor is further configured to apply a smoothnessalgorithm to the semantic segmentation of the target image.

The above features and advantages, and other features and advantages ofthe disclosure are readily apparent from the following detaileddescription when taken in connection with the accompanying drawings.

BRIEF DESCRIPTION OF THE DRAWINGS

Other features, advantages and details appear, by way of example only,in the following detailed description, the detailed descriptionreferring to the drawings in which:

FIG. 1 shows an illustrative trajectory planning system associated witha vehicle in accordance with various embodiments;

FIG. 2 shows an illustrative digital image obtained by an on-boarddigital camera of the vehicle as well as a semantically segmented imagethat corresponds to the digital image;

FIG. 3 schematically illustrates methods for training and operation of aneural network;

FIGS. 4A and 4B show various spatial priors that are obtained duringtraining of the neural network in the source domain;

FIG. 5 shows an illustrative digital image obtained in a target domainfor semantic segmentation;

FIG. 6 shows an unaided semantic segmentation image of the digitalimage; and

FIG. 7 shows a semantic segmentation image after neural networkadaptation such has been performed.

DETAILED DESCRIPTION

The following description is merely exemplary in nature and is notintended to limit the present disclosure, its application or uses. Itshould be understood that throughout the drawings, correspondingreference numerals indicate like or corresponding parts and features.

In accordance with an exemplary embodiment, FIG. 1 shows an illustrativetrajectory planning system shown generally at 100 associated with avehicle 10 in accordance with various embodiments. In general, system100 determines a trajectory plan for automated driving. As depicted inFIG. 1, the vehicle 10 generally includes a chassis 12, a body 14, frontwheels 16, and rear wheels 18. The body 14 is arranged on the chassis 12and substantially encloses components of the vehicle 10. The body 14 andthe chassis 12 may jointly form a frame. The wheels 16-18 are eachrotationally coupled to the chassis 12 near a respective corner of thebody 14.

In various embodiments, the vehicle 10 is an autonomous vehicle and thetrajectory planning system 100 is incorporated into the autonomousvehicle 10 (hereinafter referred to as the autonomous vehicle 10). Theautonomous vehicle 10 is, for example, a vehicle that is automaticallycontrolled to carry passengers from one location to another. The vehicle10 is depicted in the illustrated embodiment as a passenger car, but itshould be appreciated that any other vehicle including motorcycles,trucks, sport utility vehicles (SUVs), recreational vehicles (RVs),marine vessels, aircraft, etc., can also be used. In an exemplaryembodiment, the autonomous vehicle 10 is a so-called Level Four or LevelFive automation system. A Level Four system indicates “high automation”,referring to the driving mode-specific performance by an automateddriving system of all aspects of the dynamic driving task, even if ahuman driver does not respond appropriately to a request to intervene. ALevel Five system indicates “full automation”, referring to thefull-time performance by an automated driving system of all aspects ofthe dynamic driving task under all roadway and environmental conditionsthat can be managed by a human driver.

As shown, the autonomous vehicle 10 generally includes a propulsionsystem 20, a transmission system 22, a steering system 24, a brakesystem 26, a sensor system 28, an actuator system 30, at least one datastorage device 32, at least one controller 34, and a communicationsystem 36. The propulsion system 20 may, in various embodiments, includean internal combustion engine, an electric machine such as a tractionmotor, and/or a fuel cell propulsion system. The transmission system 22is configured to transmit power from the propulsion system 20 to thevehicle wheels 16-18 according to selectable speed ratios. According tovarious embodiments, the transmission system 22 may include a step-ratioautomatic transmission, a continuously-variable transmission, or otherappropriate transmission. The brake system 26 is configured to providebraking torque to the vehicle wheels 16-18. The brake system 26 may, invarious embodiments, include friction brakes, brake by wire, aregenerative braking system such as an electric machine, and/or otherappropriate braking systems. The steering system 24 influences aposition of the of the vehicle wheels 16-18. While depicted as includinga steering wheel for illustrative purposes, in some embodimentscontemplated within the scope of the present disclosure, the steeringsystem 24 may not include a steering wheel.

The sensor system 28 includes one or more sensing devices 40 a-40 n thatsense observable conditions of the exterior environment and/or theinterior environment of the autonomous vehicle 10. The sensing devices40 a-40 n can include, but are not limited to, radars, lidars, globalpositioning systems, optical cameras, digital cameras, thermal cameras,ultrasonic sensors, and/or other sensors. The actuator system 30includes one or more actuator devices 42 a-42 n that control one or morevehicle features such as, but not limited to, the propulsion system 20,the transmission system 22, the steering system 24, and the brake system26. In various embodiments, the vehicle features can further includeinterior and/or exterior vehicle features such as, but are not limitedto, doors, a trunk, and cabin features such as air, music, lighting,etc. (not numbered).

The data storage device 32 stores data for use in automaticallycontrolling the autonomous vehicle 10. In various embodiments, the datastorage device 32 stores defined maps of the navigable environment. Invarious embodiments, the defined maps may be predefined by, and obtainedfrom, a remote system. For example, the defined maps may be assembled bythe remote system and communicated to the autonomous vehicle 10(wirelessly and/or in a wired manner) and stored in the data storagedevice 32. The data storage device 32 further stores data and parametersfor operation of a neural network in order to operate a neural networkfor semantic segmentation of digital images. Such data can includeadaptation methods, spatial prior distribution data for features andother data, etc., as discussed herein. As can be appreciated, the datastorage device 32 may be part of the controller 34, separate from thecontroller 34, or part of the controller 34 and part of a separatesystem.

The controller 34 includes at least one processor 44 and a computerreadable storage device or media 46. The processor 44 can be any custommade or commercially available processor, a central processing unit(CPU), a graphics processing unit (GPU), an auxiliary processor amongseveral processors associated with the controller 34, a semiconductorbased microprocessor (in the form of a microchip or chip set), amacroprocessor, any combination thereof, or generally any device forexecuting instructions. The computer readable storage device or media 46may include volatile and nonvolatile storage in read-only memory (ROM),random-access memory (RAM), and keep-alive memory (KAM), for example.KAM is a persistent or non-volatile memory that may be used to storevarious operating variables while the processor 44 is powered down. Thecomputer-readable storage device or media 46 may be implemented usingany of a number of known memory devices such as PROMs (programmableread-only memory), EPROMs (electrically PROM), EEPROMs (electricallyerasable PROM), flash memory, or any other electric, magnetic, optical,or combination memory devices capable of storing data, some of whichrepresent executable instructions, used by the controller 34 incontrolling the autonomous vehicle 10.

The instructions may include one or more separate programs, each ofwhich comprises an ordered listing of executable instructions forimplementing logical functions. The instructions, when executed by theprocessor 44, receive and process signals from the sensor system 28,perform logic, calculations, methods and/or algorithms for automaticallycontrolling the components of the autonomous vehicle 10, and generatingcontrol signals to the actuator system 30 to automatically control thecomponents of the autonomous vehicle 10 based on the logic,calculations, methods, and/or algorithms. Although only one controller34 is shown in FIG. 1, embodiments of the autonomous vehicle 10 caninclude any number of controllers 34 that communicate over any suitablecommunication medium or a combination of communication mediums and thatcooperate to process the sensor signals, perform logic, calculations,methods, and/or algorithms, and generate control signals toautomatically control features of the autonomous vehicle 10.

In various embodiments, one or more instructions of the controller 34are embodied in the trajectory planning system 100 and, when executed bythe processor 44, generates a trajectory output that addresses kinematicand dynamic constraints of the environment. For example, theinstructions receive as input a digital image of the environment from anon-board digital camera and operate a neural network on processor 44 inorder to perform semantic segmentation on the digital image in order toclassify and identify objects in a field of view of the digital camera.The instructions can further perform a method of adapting the neuralnetwork to images obtained in different domains or at differentlocalities. Methods for adaptation can include using spatial priordistributions determined during a training sequence for the neuralnetwork and smoothness operations. The controller 34 further controlsthe actuator system 30 and/or actuator devices 42 a-42 c in ordernavigate the vehicle with respect to the identified objects.

The communication system 36 is configured to wirelessly communicateinformation to and from other entities 48, such as but not limited to,other vehicles (“V2V” communication,) infrastructure (“V2I”communication), remote systems, and/or personal devices (described inmore detail with regard to FIG. 2). In an exemplary embodiment, thecommunication system 36 is a wireless communication system configured tocommunicate via a wireless local area network (WLAN) using IEEE 802.11standards or by using cellular data communication. However, additionalor alternate communication methods, such as a dedicated short-rangecommunications (DSRC) channel, are also considered within the scope ofthe present disclosure. DSRC channels refer to one-way or two-wayshort-range to medium-range wireless communication channels specificallydesigned for automotive use and a corresponding set of protocols andstandards.

FIG. 2 shows an illustrative digital image 200 obtained by an on-boarddigital camera of the vehicle 10 as well as a segmented image 220 thatcorresponds to the digital image 200. In various embodiments, the image220 can be segmented by an operator or created by a processor performingsemantic segmentation. Semantic segmentation separates, delineates orclassifies pixels in the digital image according to various classes thatrepresent various objects, thereby allowing the processor to recognizethese objects and their locations in the field of view of the digitalcamera. Semantic segmentation maps a pixel by its color and itsrelationship with other pixels into classes such as cars 204, road, 206,sidewalk 208 or sky 210. Additional pixel classes can include, but arenot limited to, void, fence, terrain, truck, road, pole, sky, bus,sidewalk, traffic light, person, train, building, traffic sign, rider,motor, wall, vegetation, car, bike, etc.

FIG. 3 schematically illustrates methods for training and operation of aneural network. Mathematically, a neural network can be regarded as acomplicated nonlinear function, with an image to be segmented serving asan input to the neural network, the network predicted label maps servingas an output of the neural network, and the network parameters ascoefficients that characterizes the function. With the networkparameters initialized to selected values, the neural network 306 istrained in a first domain also referred to herein as a source domain302. The neural network 306 is presented one or more images (alsoreferred to herein as “source images” 304) along with the manuallyannotated ground truth labelled images 320 for the source domain 302.The ground truth labelled images 320 provide direct observations of theenvironment that can be used to train the neural network 306. The neuralnetwork 306 performs prediction or semantic segmentation of the one ormore source images 304 to obtain segmented network-predicted labelledimages 308. To train the neural network, network-predicted labelledimages 308 are compared with ground truth labelled images 320, using aloss function to quantitatively measure how much the network predictedlabelled images 308 differ from the ground truth labelled images 320.The training process refers to iteratively updating the parameters ofthe neural network 306, such that the loss is reduced, and thenetwork-predicted labelled images 308 gradually match closely to theground truths labelled images 320. The trained neural network 306 isprovided to a second domain also referred to herein as a target domain312. The neural network 306 performs semantic segmentation on targetimages 314 from the target domain 312 to obtain segmented labelledimages 318. Due to differences that are evident between source domain302 and target domain 312, such as different illumination, differentgeography, city vs. country, etc., the neural network 306 does notnecessarily operate as well in the target domain 312 as it does in thesource domain 302 in which it was trained. The neural network 306therefore employs various adaptation methods 316 that are used with theneural network 306 in the target domain 312 in order to enable theneural network 306 to improve the quality of the segmented labelledimage 318 in the target domain 312.

The neural network 306 is first trained by feeding source images 304with ground truth 320 from source domain 302 to the neural network 306.Training the neural network 306 is performed by adjusting one or moreneural network parameters w to obtain a minimal value of a loss functionrepresenting a domain segmentation loss or a loss that occurs during thesegmentation process according to the network-predicted labelled image308 and ground truth labelled image 320. A segmentation loss is definedas a product of a ground truth pixel label with a logarithm of apredicted class probability. The domain segmentation loss is a summationof these products over every class and pixel, and every image of thesource domain. An exemplary segmentation loss function is shown in Eq.(1):

$\begin{matrix}{\min\limits_{w}\left\{ {- {\sum\limits_{s = 1}^{S}{\sum\limits_{n = 1}^{N}{y_{s,n}^{T}{\log \left( {p_{n}\left( {w,I_{s}} \right)} \right)}}}}} \right\}} & (1)\end{matrix}$

where w is the neural network parameter, I_(s) is the source image 304,p_(n) is the predicted class probability of the n^(th) pixel of thesource image 304 as determined by the neural network (or a probabilitythat the n^(th) pixel belongs a selected class), and y_(s,n) ^(T) is apixel label or column vector for the n^(th) pixel. The pixel labely_(s,n) ^(T) is generally a one hot vector used to identify the n^(th)pixel. The logarithm of the predicted class probability is a negativenumber due to probabilities being between 0 and 1. Thus summations aremultiplied by “−1” prior to minimization.

The network is then trained using adversarial training on both thesource images 304 and target images 314 to improve the predictionperformance of the neural network 306 on images from target images 314.The domain adversarial training is formulated as the optimizationproblem below

$\begin{matrix}{\mspace{79mu} {L_{total} = {L_{seg} - {\lambda_{A}L_{A}\mspace{20mu} {where}}}}} & (2) \\{L_{A} = {\max\limits_{w_{F}}{\min\limits_{\theta}\left\{ {{\sum\limits_{s = 1}^{S}{\sum\limits_{n = 1}^{N_{S}}{\log \left( {p_{n}\left( {w_{F},\theta,I_{s}} \right)} \right)}}} - {\sum\limits_{t = 1}^{T}{\sum\limits_{n = 1}^{N_{S}}{\log \left( {1 - {p_{n}\left( {w_{F},\theta,I_{t}} \right)}} \right)}}}} \right\}}}} & (3) \\{\mspace{76mu} {L_{seg} = {\max\limits_{w_{F},w_{S}}\left\{ {\sum\limits_{s = 1}^{S}{\sum\limits_{n = 1}^{N_{S}}{y_{s,n}^{T}{\log \left( {p_{n}\left( {w_{F},w_{S},I_{s}} \right)} \right)}}}} \right\}}}} & (4)\end{matrix}$

p_(n)(w_(F), θ, I_(s/t)) is the probability for the n^(th) pixel, in animage I_(s/t) of being predicted as from the source domain. I_(s/t)indicates that the image is from the source/target domain. The indextϵ{1,2, . . . , T}, nϵ{1,2, . . . , N} are the parameters for the domaindiscriminating network, which is built on top of the neural networkparameter w_(F) corresponding to the feature generation network.Parameter w_(S) is the neural network parameter corresponding to thesegmentation network. The parameters w_(F) and w_(S) form thesegmentation network.

The above equations (2)-(4) can be solved by the following iterativeprocess: 1) train a domain discriminator to distinguish features of thesource domain from features of the target domain by solving the innerminimization problem of Eq. (3) via a method of stochastic gradientdescent; and 2) train the feature extraction network w_(F) and w_(S) bysolving the outer maximization of Eq. (3) combined with Eq. (4).

Once the neural network parameter w has been determined by domainadversarial training, self-training based domain adaptation is furtherused to better adapt the network to the target domain. The method isused to perform semantic segmentation on target images from the targetdomain. Domain adaptation methods are used to adapt the neural networkto the target domain, thereby improving the effectiveness of the neuralnetwork in the target domain. Similar to domain adversarial training,self-training based domain adaptation also helps to improve theeffectiveness of the neural network in the target domain byincorporating target images in multiple rounds or iterations of networktraining without requiring human annotated ground truths. However unlikedomain adversarial training, self-training based domain adaptationadopts a loss function minimization or reduction training frameworksimilar to traditional network training in Eq. (1) without theadversarial step in domain adversarial training. Since the target domainground truths are not available, self-training based domain adaptationgenerates network predictions on target images and incorporates the mostconfident predictions in network training as approximated target groundtruths (herein referred to as pseudo-labels). Once the networkparameters are updated, the updated network regenerates thepseudo-labels on target images, and incorporate them for a next round ofnetwork training. This process is iteratively repeated for multiplerounds. Mathematically, each round of pseudo-label generation andnetwork training can be formulated as minimizing the loss function shownin Eq. (2).

Once the neural network parameter w has been determined, it is used toperform semantic segmentation on target images from the target domain.Domain adaptation methods are used to adapt the neural network to thetarget domain, thereby improving an effectiveness of the neural networkin the target domain. In order to perform domain adaptation in thetarget domain, a second loss function is minimized that describes asummation of a segmentation loss in the source domain and a segmentationloss in the target domain. A representative loss function for theprocess of domain adaptation is shown in Eq. (5):

$\begin{matrix}{\begin{matrix}\min \\{\hat{y},w}\end{matrix}\left\{ {- \left\lbrack {{\sum\limits_{s = 1}^{S}{\sum\limits_{n = 1}^{N}{y_{s,n}^{T}\log \left( {p_{n}\left( {w,\; I_{s}} \right)} \right)}}} + {\sum\limits_{t = 1}^{T}{\sum\limits_{n = 1}^{N}{\sum\limits_{c = 1}^{C}{{\hat{y}}_{t,n}^{(c)}{\log \left( {p_{n}\left( {{cw},I_{t}} \right)} \right)}}}}} + {\sum\limits_{c = 1}^{C}{k_{c}{\hat{y}}_{t,n}^{(c)}}}} \right\rbrack} \right\}} & {{Eq}.\mspace{14mu} (5)}\end{matrix}$

such that

ŷ_(t,n) ∈ {{e|e ∈

^(C)}∪0}  Eq. (6)

k_(c)>0, ∀c   Eq. (7)

where I_(T) is the target image in the target domain, and p_(n) is thepredicted class probability. The term p_(n) (c|w, I_(t)) is aprobability that an n^(th) pixel of the target image I_(t) (asdetermined by the neural network having parameter w) is in class c. Thesegmentation loss in the source domain is represented by the first term(having summations over S and N) and the segmentation loss in the targetdomain is represented by the second term (having summations over T, Nand C). The class term c appears only in the second term (i.e., thetarget domain). In the second term, the predicted class probability ismultiplied by a pseudo-label ŷ_(t,n) ^((c)). The pseudo-label ŷ_(t,n)^((c)) is a scalar value for an n^(th) pixel in class c. Thepseudo-label ŷ_(t,n) ^((c)) is a variable of the loss function that isadjusted in order to minimize the loss function of Eq. (5). Once thepseudo-labels have been determined, the target images can beincorporated to network training by minimizing Eq. (5) with respect tonetwork parameters w while fixing the pseudo-labels can be used incalculations to perform semantic segmentation of the target image.

The third term Σ_(c=1) ^(C)k_(c)ŷ_(t,n) ^((c)) is a constraint term thatprevents the minimum value of the loss function from being zero orproviding a trivial solution. Therefore minimizing the loss function ofEq. (5) includes determining a local minimum of the loss function ratherthan an absolute minimum of the loss function. The parameter k_(c) is athreshold value that supervises the training process on the targetdomain by controlling a strictness of the pseudo-label generationprocess for class c. In particular, supervision refers to controllingthe values for k_(c) for each class in order to provide a constraint onthe particular class, leading to a class-balanced framework forperforming the neural network training, such as in self-training domainadaptation training. The selection of values can be used to preventlarge classes (i.e., classes that include a large portion of the pixels)from overwhelming small classes (i.e., classes that include few pixels)and preventing the small classes from being subsumed by larger classes.As an illustrative example, large classes can include sky, road,buildings, etc., while small classes can include stop signs, telephonepoles, etc. In a framework in which the neural network is self-training,one can count the frequency of occurrence of each class in images formthe source domain and find a threshold of a certain class in which theproportion of pixels with predicted probabilities of that class greaterthan the threshold is equal to the source domain frequency. Thisthreshold is then used to set the parameter k_(c). Selecting differentparameter values k_(c) for each class provides supervision to thetraining of the neural network in the target domain by constraining theclasses from changing size when segmenting the target images.

In another aspect, the methods disclosed herein use spatial priordistributions to reduce the summation of the segmentation loss in thetarget domain and the segmentation loss in the source domain. Despitethe variations between source domains and target domains, variousfeatures or objects tend to occur in the same or similar locationswithin digital images regardless of domain. For example, sky oftenoccupies the upper part of the image, while road and sidewalk often stayat the bottom part. The probability distribution of these features in animage can be provided in a scalar field referred to herein as a spatialprior distribution. Spatial prior distributions are generally determinedfrom images in the source domain when training the neural network andare then stored in the storage medium for use in the target domain. Whenthe neural network is segmenting the target image, the spatial priordistribution can be used along with the target image in order to improveclass probabilities in the target domain.

FIGS. 4A and 4B show various spatial priors that are obtained duringtraining of the neural network in the source domain. The figuresillustrate spatial priors of 19 different classes. In the top row ofFIG. 4A, the classes, from left to right, are road 401, sidewalk 402,building 403, and wall 404. In the second row from left to right, theclasses are fence 405, pole 406, traffic light 407, and traffic sign408. In the third row, from left to right, the classes are treevegetation 409, terrain 410, sky 411, and person 412. Continuing in thetop row of FIG. 4B, from left to right, the classes are rider 413, car414, truck 415, and bus 416. In the second row of FIG. 4B, from left toright, the classes are train 417, motorcycle 418 and bicycle 419. Eachspatial prior distribution is shown for a digital image that is 2000pixels across and 1000 pixels in height, although the digital image canhave any particular dimension or aspect ratio in various embodiments.The light areas of a spatial prior distribution indicate a location ofhigh probability of occurrence of the feature. The dark areas of aspatial prior distribution indicate a location of low probability ofoccurrence of the feature. The gray scale indicating the probabilitiesare indicated to the right of the spatial prior.

As an example, the spatial prior for a sidewalk 402 indicates thatsidewalks tends to appear near the bottom or side of the image. Thespatial prior for the sky 411 indicates that the sky tends to appearnear the top and center of the image. The spatial prior for buildings403 and the spatial prior for tree vegetation 409 indicate thatbuildings and tree vegetation tend to run across the top of images.

In one embodiment, the spatial prior distributions can be input into thecost function in order to provide another term that refines the semanticsegmentation process in the target domain. In various embodiments, thespatial prior distribution is multiplied by the predicted classprobability p_(n), and the target segmentation loss is determined fromthis product An exemplary loss function that involves spatial priordistributions is shown in Eq. (8):

$\begin{matrix}{\begin{matrix}\min \\{\hat{y},w}\end{matrix}\left\{ {- \left\lbrack {{\sum\limits_{s = 1}^{S}{\sum\limits_{n = 1}^{N}{y_{s,n}^{T}{\log \left( {p_{n}\left( {w,\; I_{s}} \right)} \right)}}}} + {\sum\limits_{t = 1}^{T}{\sum\limits_{n = 1}^{N}{\sum\limits_{c = 1}^{C}{{\hat{y}}_{t,n}^{(c)}{\log \left( {{p_{n}\left( {{cw},I_{t}} \right)}q_{n}^{(c)}} \right)}}}}} + {\sum\limits_{c = 1}^{C}{k_{c}{\hat{y}}_{t,n}^{(c)}}}} \right\rbrack} \right\}} & {{Eq}.\mspace{14mu} (8)}\end{matrix}$

such that

ŷ_(t,n) ^(T) ∈ {{e|e ∈

^(C)}∪0}  Eq. (9)

Σ_(n) q _(n) ^((c))=1/C   Eq. (10)

k>0, ∀c   Eq. (11)

In another aspect, smoothness found in a segmentation that occurs in thesource domain can be used to provide smoothing in segmentation images inthe target domain. Pixels that have similar features and are grouped ina same class in the source domain should be grouped together in thetarget domain.

FIG. 5 shows an illustrative digital image 500 obtained in a targetdomain for semantic segmentation. The image 500 includes various featureclasses, such as sky 502, vehicle 504, road 506 and vehicle hood 508.

FIG. 6 shows an unaided semantic segmentation image 600 of the digitalimage 500. The sky class 602 clearly takes up less of the segmentationimage 600 than sky 502 does of the digital image 500. Also, the vehicle504 of digital image 500 is represented by two different featureclasses, labelled 604 a and 604 b, in the segmentation image 600. Theroad class 606 of segmentation image 600 takes up only a portion of thesegmentation image 600 whereas the corresponding road 506 of image 500reaches from the left side to the right side of the digital image 500.Also, the hood class 608 appears to be much larger in the segmentationimage 600 than the corresponding hood 508 does in the digital image 500.

FIG. 7 shows a semantic segmentation image 700 after neural networkadaptation (such as Eq. (5)) has been performed. The class features ofimage 700 are better proportioned to the features of the original image500 than are the class features of image 600. In particular, the sky 702more closely represents sky 502 of image 500 than does the sky 602 ofimage 600. The vehicle 504 is represented by a single class 704 in image700. The road class 706 takes up much more of the image 700, just asdoes the road 506 of image 500. Additionally, the hood class 708 hasbeen reduced to more closely conform to the size to the hood 508 ofimage 500.

While the above disclosure has been described with reference toexemplary embodiments, it will be understood by those skilled in the artthat various changes may be made and equivalents may be substituted forelements thereof without departing from its scope. In addition, manymodifications may be made to adapt a particular situation or material tothe teachings of the disclosure without departing from the essentialscope thereof. Therefore, it is intended that the disclosure not belimited to the particular embodiments disclosed, but will include allembodiments falling within the scope of the application.

What is claimed is:
 1. A method of navigating a vehicle, comprising:determining a target segmentation loss for training a neural network toperform semantic segmentation on a target domain image; determining avalue of a pseudo-label of the target image by reducing the targetsegmentation loss while providing a supervision of the training over thetarget domain; performing semantic segmentation on the target imageusing the trained neural network to segment the target image andclassify an object in the target image; and navigating the vehicle basedon the classified object in the target image.
 2. The method of claim 1,further comprising determining a source segmentation loss for trainingthe neural network to perform semantic segmentation on a source domainimage, and reducing a summation of the source segmentation loss and thetarget segmentation loss while providing the supervision of the trainingover the target domain.
 3. The method of claim 2, further comprisingreducing the summation by adjusting parameters of the neural network andthe value of the pseudo-label.
 4. The method of claim 1, furthercomprising determining the value of the pseudo label of the target imageby reducing the target segmentation loss over a plurality ofsegmentation classes while providing the supervision to each of theplurality of segmentation classes.
 5. The method of claim 1, whereindetermining the target segmentation loss further comprises multiplyingthe spatial prior distribution for the segmentation class by a classprobability of a pixel being in the segmentation class.
 6. The method ofclaim 1, further comprising training the neural network usingadversarial domain adaptation training.
 7. The method of claim 1,further comprising training the neural network using a self-trainingdomain adaptation training.
 8. The method of claim 1, whereinsupervision of the training further comprises performing class-balancingfor the target segmentation loss.
 9. The method of claim 1, furthercomprising applying a smoothness algorithm to the semantic segmentationof the target image.
 10. A navigation system for a vehicle, comprising:a digital camera for capturing a target image of a target domain of thevehicle; a processor configured to: determine a target segmentation lossfor training the neural network to perform semantic segmentation of thetarget image in the target domain; determine a value of a pseudo-labelof the target image by reducing the target segmentation loss whileproviding a supervision of the training over the target domain; performsemantic segmentation on the target image using the trained neuralnetwork to segment the target image and classify an object in the targetimage; and navigate the vehicle based on the classified object in thetarget image.
 11. The navigation system of claim 10, wherein theprocessor is further configured to determine a source segmentation lossfor training the neural network to perform semantic segmentation on asource domain image, and reduce a summation of the source segmentationloss and the target segmentation loss while providing the supervision ofthe training over the target domain.
 12. The navigation system of claim11, wherein the processor is further configured to reduce the summationby adjusting a parameter of the neural network and the value of thepseudo-label.
 13. The navigation system of claim 10, wherein theprocessor is further configured to determine the value of thepseudo-label of the target image by reducing the target segmentationloss over a plurality of segmentation classes while providing thesupervision to each of the plurality of segmentation classes.
 14. Thenavigation system of claim 10, wherein the processor is furtherconfigured to multiply a spatial prior distribution for the segmentationclass by a class probability of a pixel being in the segmentation class.15. A vehicle, comprising: a digital camera for capturing a target imageof a target domain of the vehicle; a processor configured to: determinea target segmentation loss for training the neural network to performsemantic segmentation of the target image in the target domain;determine a value of a pseudo-label of the target image by reducing thetarget segmentation loss while providing a supervision of the trainingover the target domain; perform semantic segmentation on the targetimage using the trained neural network and the pseudo-label to segmentthe target image and classify an object in the target image; andnavigate the vehicle based on the classified object in the target image.16. The vehicle of claim 15, wherein the processor is further configuredto determine a source segmentation loss for training the neural networkto perform semantic segmentation on a source domain image, and reducinga summation of the source segmentation loss and the target segmentationloss while providing the supervision of the training over the targetdomain.
 17. The vehicle of claim 16, wherein the processor is furtherconfigured to reduce the summation by adjusting a parameter of theneural network and the value of the pseudo-label.
 18. The vehicle ofclaim 15, wherein the processor is further configured to determine thevalue of the pseudo-label of the target image by reducing the targetsegmentation loss over a plurality of segmentation classes whileproviding the supervision to each of the plurality of segmentationclasses.
 19. The vehicle of claim 15, wherein the processor is furtherconfigured to multiply a spatial prior distribution for a segmentationclass by a class probability of a pixel being in the segmentation classto determine the target segmentation loss.
 20. The vehicle of claim 15,wherein the processor is further configured to apply a smoothnessalgorithm to the semantic segmentation of the target image.